Entropy analysis of word-length series of natural language texts: Effects of text language and genre

نویسندگان

  • Maria Kalimeri
  • Vassilios Constantoudis
  • Constantinos Papadimitriou
  • Konstantinos Karamanos
  • Fotis K. Diakonos
  • Harris Papageorgiou
چکیده

We estimate the n-gram entropies of natural language texts in word-length representation and find that these are sensitive to text language and genre. We attribute this sensitivity to changes in the probability distribution of the lengths of single words and emphasize the crucial role of the uniformity of probabilities of having words with length between five and ten. Furthermore, comparison with the entropies of shuffled data reveals the impact of word length correlations on the estimated n-gram entropies.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series

This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...

متن کامل

EFL Textbook Evaluation: An Analysis of Readability and Vocabulary Profiler of Four Corners Book Series

This study aimed to investigate whether there is any significant relationship between the readability and vocabulary profile including the most frequent words (K1 words) and academic word list (AWL) of reading passages of Four Corners series which were EFL textbooks. To determine the readability of the texts, the Flesch–Kincaid (1975) readability test was used, while the texts' academic word li...

متن کامل

The Intertextuality in an English as a Foreign Language Textbook: An Analytical Study of Interchange Fourth Edition

This study investigated the utilization of intertextuality in the fourth edition of the Interchange book series for English as Foreign Language (EFL) Learners using Fairclough’s (1992) framework. Ten texts were randomly chosen among the reading passages of the Interchange book series and later analyzed regarding intertextuality kinds and methods of reporting. Findings indicated that two types o...

متن کامل

Producing a Persian Text Tokenizer Corpus Focusing on Its Computational Linguistics Considerations

The main task of the tokenization is to divide the sentences of the text into its constituent units and remove punctuation marks (dots, commas, etc.). Each unit is a continuous lexical or grammatical writing chain that is an independent semantic unit. Tokenization occurs at the word level and the extracted units can be used as input to other components such as stemmer. The requirement to create...

متن کامل

The Effect of Extensive Reading on Iranian EFL Learners’ Lexical Bundle Performance: a comparative study of adaptive and authentic texts

Formulaic language and sequence as the core characteristic of real-life language and native-like fluency, has been a subject of inquiry in recent decades. The aim of the present study is to investigate the effects of two extensive reading text types, i.e., adaptive and authentic, on Iranian EFL learners’ development of lexical bundles. To this aim, 20 intermediate EFL learners were chosen to pa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • I. J. Bifurcation and Chaos

دوره 22  شماره 

صفحات  -

تاریخ انتشار 2012